Automatic Indexing of Specialized Documents: Using Generic vs. Domain-Specific Document Representations
نویسندگان
چکیده
The shift from paper to electronic documents has caused the curation of information sources in large electronic databases to become more generalized. In the biomedical domain, continuing efforts aim at refining indexing tools to assist with the update and maintenance of databases such as MEDLINE. In this paper, we evaluate two statistical methods of producing MeSH indexing recommendations for the genetics literature, including recommendations involving subheadings, which is a novel application for the methods. We show that a generic representation of the documents yields both better precision and recall. We also find that a domainspecific representation of the documents can contribute to enhancing recall.
منابع مشابه
Automatic Workflow Generation and Modification by Enterprise Ontologies and Documents
This article presents a novel method and development paradigm that proposes a general template for an enterprise information structure and allows for the automatic generation and modification of enterprise workflows. This dynamically integrated workflow development approach utilises a conceptual ontology of domain processes and tasks, enterprise charts, and enterprise entities. It also suggests...
متن کاملAutomatic indexing of scanned documents: a layout-based approach
Archiving official written documents such as invoices, reminders and account statements in business and private area gets more and more important. Creating appropriate index entries for document archives like sender’s name, creation date or document number is a tedious manual work. We present a novel approach to handle automatic indexing of documents based on generic positional extraction of
متن کاملDevelopment of Bilingual Domain-Specific Ontology for Automatic Conceptual Indexing
In the paper we describe development, means of evaluation and applications of Russian–English Sociopolitical Thesaurus specially developed as a linguistic resource for automatic text processing applications. The Sociopolitical domain is not a domain of social research but a broad domain of social relations including economic, political, military, cultural, sports and other subdomains. The knowl...
متن کاملStudy of Indexing Techniques to Improve the Performance of Information Retrieval in Telugu Language
Information Retrieval Systems (IRS) are so popular through World Wide Web. Availability of Text Information related to all types of objects like Documents, Web Pages, Images, Videos and Audio files on web are increasing day by day in an exponential manner. When the text repository grows to the maximum extent of the memory size in the server, the methods used to find a particular text unit eithe...
متن کاملEvaluation of a Meta-1-based automatic indexing method for medical documents.
This paper describes MetaIndex, an automatic indexing program that creates symbolic representations of documents for the purpose of document retrieval. MetaIndex uses a simple transition network parser to recognize a language that is derived from the set of main concepts in the Unified Medical Language System Metathesaurus (Meta-1). MetaIndex uses a hierarchy of medical concepts, also derived f...
متن کامل